Load the dataset listings.csv which includes information about AirBNB listings in Boston
library(tidyverse)
listings <- read_csv("listings.csv")
Load your mapbox token to the system
Sys.setenv("MAPBOX_TOKEN" = "pk.eyJ1IjoiamN3YW5nNTg3IiwiYSI6ImNsdWtkNnprcTAydngyaWxsamFrcjg1NWQifQ.QWeeRhtZjVYJjD2f401m4g")
Use some EDA tool/s to analyze the price variable (may or may not be graphical).
Based on your observations, consider removing outliers.
Please analyze carefully the outliers - are they random? do you think they represent errors?
# Boxplot to identify outliers
ggplot(listings, aes(y = price)) +
geom_boxplot() +
theme_minimal()
From the boxplot, we can see that there are some outliers in the price variable, which are pretty far from the rest of the data. These outliers with prices around $10,000 are likely errors, as they are significantly higher than the rest of the data. We will consider removing these outliers.
print(nrow(listings))
## [1] 2959
listings <- listings %>% filter(price <= 9500)
print(nrow(listings))
## [1] 2957
Create a plot that demonstrates the effect of neighborhood on price. Please transform the price variable with log base 10.
listings <- listings %>% filter(price > 0)
listings$log_price <- log10(listings$price)
ggplot_obj <- ggplot(
listings, aes(x = neighbourhood, y = log_price)) +
geom_boxplot() +
theme_minimal() +
labs(x = "Neighborhood", y = "Log-transformed Price (base 10)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot_obj
Next, we move to organize the price data on a mapbox layer. Choose how visualize the price data. Consider which type of base layer best assists you in showing what you want to show. Add interesting information to the tooltip. Make sure that everything that is visible is also desirable.
library(plotly)
mapbox_plot <- listings %>%
mutate(log_price = log10(price + 1)) %>%
plot_mapbox() %>%
add_markers(x = ~longitude,
y = ~latitude,
color = ~as.factor(neighbourhood),
text = ~paste0("The price is $", price, "<br>",
"in ", neighbourhood, "<br>")) %>%
layout(
mapbox = list(
center = list(lat = 42.32, lon = -71.06),
zoom = 10
)
)
mapbox_plot
Use subplot() to combine the visualizations from 4 and 5
into a single plot. Place the plot from 4 on top and the one from 5 on
bottom. Figure out how to fine-tune subplot so that you can
see all the information you want to.
boxplot_plotly <- ggplotly(ggplot_obj)
combined_plot <- subplot(boxplot_plotly, mapbox_plot, nrows = 2, margin = 0.1, heights = c(0.4, 0.6))
combined_plot %>% layout(height = 800)
GPX is a popular format for storing GPS data. The file
mbta.gpx (retrieved from http://erikdemaine.org/maps/mbta/) includes waypoints of
all of the MBTA stations as well as routes of rapid transit lines:
subway / light train (red, orange, blue and green lines), bus rapid
transit (silver line) and commuter rail.
The read_GPX() function from the tmaptools
package reads GPX files into sf objects in
R.
library(tmaptools)
mbta <- read_GPX("mbta.gpx")
Add all subway / light rail stations and tracks to the plot (red, orange, blue and green lines).
library(sf)
# Filter the tracks and add them to the map
mapbox_tracks_plot <- mapbox_plot %>%
add_sf(data = filter(mbta$tracks, grepl("Red Line", name)), color = I("black"), name = "Red Line") %>%
add_sf(data = filter(mbta$tracks, grepl("Orange Line", name)), color = I("black"), name = "Orange Line") %>%
add_sf(data = filter(mbta$tracks, grepl("Blue Line", name)), color = I("black"), name = "Blue Line") %>%
add_sf(data = filter(mbta$tracks, grepl("Green Line", name)), color = I("black"), name = "Green Line")
# Filter for stations related to the specified subway lines
subway_lines = c("Red Line", "Orange Line", "Blue Line", "Green Line")
subway_stations <- filter(mbta$waypoints, grepl(paste(subway_lines, collapse="|"), type))
# Add the filtered stations to the map
mapbox_tracks_plot <- mapbox_tracks_plot %>%
add_sf(data = subway_stations, color = I("black"), text = ~paste0(name), hoverinfo = "text", name = "Stations")
mapbox_tracks_plot
Have the track for each line appearing with the appropriate color (red, orange, green, blue) and with a legend entry that allows hiding/showing it (instead of a legend entry for all lines together). You can consider writing a function that will save you much typing.
library(sf)
library(dplyr)
library(plotly)
# Filter the tracks and add them to the map
mapbox_tracks_plot <- mapbox_plot %>%
add_sf(data = filter(mbta$tracks, grepl("Red Line", name)), color = I("red"), name = "Red Line") %>%
add_sf(data = filter(mbta$tracks, grepl("Orange Line", name)), color = I("orange"), name = "Orange Line") %>%
add_sf(data = filter(mbta$tracks, grepl("Blue Line", name)), color = I("blue"), name = "Blue Line") %>%
add_sf(data = filter(mbta$tracks, grepl("Green Line", name)), color = I("green"), name = "Green Line")
# Filter for stations related to the specified subway lines
subway_lines = c("Red Line", "Orange Line", "Blue Line", "Green Line")
subway_stations <- filter(mbta$waypoints, grepl(paste(subway_lines, collapse="|"), type))
# Add the filtered stations to the map
mapbox_tracks_plot <- mapbox_tracks_plot %>%
add_sf(data = subway_stations, color = I("black"), text = ~paste0(name), hoverinfo = "text", name = "Stations")
mapbox_tracks_plot
KML is another popular format for storing GPS data. The
Boston_Neighborhoods.kml file (downloaded from https://data.boston.gov/dataset/boston-neighborhoods on
March 31st, 2021) contains the boundaries of all of Boston’s
neighborhoods. You can read KML data into an
sf object using the function st_read() from
the sf package.
boston_neighborhoods <- sf::st_read("Boston_Neighborhoods.kml")
## Reading layer `Boston_Neighborhoods' from data source
## `D:\56_STAT_633\HW7\Boston_Neighborhoods.kml' using driver `KML'
## Simple feature collection with 26 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -71.19125 ymin: 42.22792 xmax: -70.92278 ymax: 42.39699
## Geodetic CRS: WGS 84
Add the neighborhood boundaries to your map (with a legend entry that allows showing/hiding those).
mapbox_tracks_neighbor_plot <- mapbox_tracks_plot %>%
add_sf(
data = boston_neighborhoods,
split = ~Name,
stroke = I("black"), span = I(1),
text = ~paste(Name),
hoverinfo = "text"
)
mapbox_tracks_neighbor_plot
Organize the last plot with the plot from 4 in a meaningful way. You might need to tweak either so that all labels, titles etc. appear as you want them to.
combined_plot <- subplot(boxplot_plotly, mapbox_tracks_neighbor_plot, nrows = 2, margin = 0.1, heights = c(0.4, 0.6))
combined_plot %>% layout(height = 800)
Compute the distance between each AirBnB listing and the nearest T-stop. Add this quantity to the tooltip (hover). Add this relationship to the visualization that ties neighborhood and price.
if (!inherits(listings, "sf")) {
listings <- st_as_sf(listings, coords = c("longitude", "latitude"), crs = 4326)
}
if (!inherits(subway_stations, "sf")) {
subway_stations <- st_as_sf(subway_stations, coords = c("longitude", "latitude"), crs = 4326)
}
# Compute the distance between each Airbnb listing and the nearest T-stop
listings$distance_to_tstop <- st_distance(listings, subway_stations) %>% apply(1, min)
listings <- mutate(listings, distance_to_tstop = units::set_units(distance_to_tstop, "m"))